The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
The pretraining-finetuning paradigm has demonstrated great success in NLP and 2D image fields because of the high-quality representation ability and transferability of their pretrained models. However, pretraining such a strong model is difficult in the 3D point cloud field since the training data is limited and point cloud collection is expensive. This paper introduces \textbf{E}fficient \textbf{P}oint \textbf{C}loud \textbf{L}earning (EPCL), an effective and efficient point cloud learner for directly training high-quality point cloud models with a frozen CLIP model. Our EPCL connects the 2D and 3D modalities by semantically aligning the 2D features and point cloud features without paired 2D-3D data. Specifically, the input point cloud is divided into a sequence of tokens and directly fed into the frozen CLIP model to learn point cloud representation. Furthermore, we design a task token to narrow the gap between 2D images and 3D point clouds. Comprehensive experiments on 3D detection, semantic segmentation, classification and few-shot learning demonstrate that the 2D CLIP model can be an efficient point cloud backbone and our method achieves state-of-the-art accuracy on both real-world and synthetic downstream tasks. Code will be available.
translated by 谷歌翻译
Artificial Intelligence (AI) is having a tremendous impact across most areas of science. Applications of AI in healthcare have the potential to improve our ability to detect, diagnose, prognose, and intervene on human disease. For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e.g. geometry, physiology, physics) of medical data being processed. This work introduces MONAI, a freely available, community-supported, and consortium-led PyTorch-based framework for deep learning in healthcare. MONAI extends PyTorch to support medical data, with a particular focus on imaging, and provide purpose-specific AI model architectures, transformations and utilities that streamline the development and deployment of medical AI models. MONAI follows best practices for software-development, providing an easy-to-use, robust, well-documented, and well-tested software framework. MONAI preserves the simple, additive, and compositional approach of its underlying PyTorch libraries. MONAI is being used by and receiving contributions from research, clinical and industrial teams from around the world, who are pursuing applications spanning nearly every aspect of healthcare.
translated by 谷歌翻译
现有检测方法通常使用参数化边界框(Bbox)进行建模和检测(水平)对象,并将其他旋转角参数用于旋转对象。我们认为,这种机制在建立有效的旋转检测回归损失方面具有根本的局限性,尤其是对于高精度检测而言,高精度检测(例如0.75)。取而代之的是,我们建议将旋转的对象建模为高斯分布。一个直接的优势是,我们关于两个高斯人之间距离的新回归损失,例如kullback-leibler Divergence(KLD)可以很好地对齐实际检测性能度量标准,这在现有方法中无法很好地解决。此外,两个瓶颈,即边界不连续性和正方形的问题也消失了。我们还提出了一种有效的基于高斯度量的标签分配策略,以进一步提高性能。有趣的是,通过在基于高斯的KLD损失下分析Bbox参数的梯度,我们表明这些参数通过可解释的物理意义进行了动态更新,这有助于解释我们方法的有效性,尤其是对于高精度检测。我们使用量身定制的算法设计将方法从2-D扩展到3-D,以处理标题估计,并在十二个公共数据集(2-D/3-D,空中/文本/脸部图像)上进行了各种基本检测器的实验结果。展示其优越性。
translated by 谷歌翻译
Terahertz超质量多输入多输出(THZ UM-MIMO)被设想为6G无线系统的关键推动器之一。由于其阵列孔和小波长的关节作用,Thz Um-Mimo的近场区域大大扩大。因此,此类系统的高维通道由远处和近场的随机混合物组成,这使通道估计非常具有挑战性。以前基于单场假设的作品无法捕获混合动力远处和近场特征,因此遭受了巨大的性能丧失。这激发了我们考虑混合场渠道估计。我们从固定点理论中汲取灵感,以开发具有自适应复杂性和线性收敛保证的有效基于深度学习的渠道估计器。基于经典的正交近似消息传递,我们将每次迭代转换为一个合同映射,包括封闭形式的线性估计器和基于神经网络的非线性估计器。主要的算法创新涉及应用固定点迭代以计算通道估计,同时对具有任意深度的神经网络进行建模并适应混合场通道条件。仿真结果验证了我们的理论分析,并在估计准确性和收敛速率上显示出对最先进方法的显着性能。
translated by 谷歌翻译
乌鸦的进步矩阵(RPMS)经常用于评估人类的视觉推理能力。研究人员在开发一个系统方面取得了相当大的努力,这些系统通常通过黑盒端到端卷积神经网络(CNN)用于视觉识别和逻辑推理任务。为了开发一个高度可解释的解决方案的目标,我们提出了一次性的人为可理解的推理(OS-HURS),这是一个两步框架,包括一种感知模块和推理模块,以解决现实世界的挑战可视识别和随后的逻辑推理任务。对于推理模块,我们提出了一种“2 + 1”制剂,可以通过人类更好地理解,并显着降低模型复杂性。因此,可以仅从一个RPM示例推导出精确推理规则,这对于现有解决方案方法来说是不可行的。所提出的推理模块还能够产生一系列推理规则,精确地建模人类知识来解决RPM问题。为了验证真实应用程序的提出方法,构建了RPM样单射帧预测(ROF)数据集,其中在使用现实世界视频帧而不是合成图像构造的RPM上进行视觉推理。各种RPM样数据集上的实验结果表明,与最先进的模型相比,所提出的OS-HUR达到了显着且一致的性能增益。
translated by 谷歌翻译
雷达步态识别对于轻微的变化和侵犯隐私的侵权是强大的。以前的研究通常利用谱图或节奏速度图。虽然前者显示时频模式,但后者编码重复频率模式。在这项工作中,提出了一种具有基于注意力的融合的双流神经网络,以完全聚合来自这两个表示的判别信息。这两个流都是基于视觉变压器设计的,该变压器很好地捕获嵌入这些表示中的步态特性。该方法在大型基准数据集上验证了雷达步态识别,这表明它显着优于最先进的解决方案。
translated by 谷歌翻译
本文回顾了关于压缩视频质量增强质量的第一个NTIRE挑战,重点是拟议的方法和结果。在此挑战中,采用了新的大型不同视频(LDV)数据集。挑战有三个曲目。Track 1和2的目标是增强HEVC在固定QP上压缩的视频,而Track 3旨在增强X265压缩的视频,以固定的位速率压缩。此外,轨道1和3的质量提高了提高保真度(PSNR)的目标,以及提高感知质量的2个目标。这三个曲目完全吸引了482个注册。在测试阶段,分别提交了12个团队,8支球队和11支球队,分别提交了轨道1、2和3的最终结果。拟议的方法和解决方案衡量视频质量增强的最先进。挑战的首页:https://github.com/renyang-home/ntire21_venh
translated by 谷歌翻译
Raven的渐进矩阵(RPMS)经常用于测试人类的视觉推理能力。最近的RPM数据集和解决方案模型的进步部分地解决了视觉上了解RPM问题的挑战和逻辑推理缺失答案。鉴于RPM数据集中的样本不足的普遍性表现差,我们提出了一种有效的方案,即候选答案形态混合(CAM-MIX)。CAM-MIX通过灰度图像形态混合用作数据增强策略,这规范了各种解决方案方法并克服了模型过度拟合问题。通过创建语义类似于正确答案的新负候选答案,可以定义更准确的决策边界。通过应用所提出的数据增强方法,与最先进的模型相比,在各种RPM样数据集上实现了显着且一致的性能改进。
translated by 谷歌翻译
在本文中,我们建立了双Q学习和Q学习的渐近于点误差之间的理论比较。我们的结果基于基于Lyapunov方程的线性随机近似的分析,并适用于表格设置和线性函数近似,但前提是最佳策略是唯一的,并且算法收敛。我们表明,如果双Q学习使用Q学习率的两倍,并输出了两个估计量的平均值,则双Q学习的渐近于点误差完全等于Q学习的误差。我们还使用模拟给出了这种理论观察的一些实际含义。
translated by 谷歌翻译